The Integration of a Part-of-Speech Tagger into the ALEP Platform
نویسندگان
چکیده
We describe how part-of-speech information delivered by a tagger (the mpro tool) has been integrated into the alep (Advanced Language Engineering Platform) system. For this we extended an approach described within the ls-gram project, which consisted in de ning the Text Handling component of alep in such a way that so-called \messy details" are handled within this subsystem, hence keeping the (linguistic) parser free from such tasks. We just extended the tagging strategy used for this purpose to normal words and modi ed the default tagging of words proposed by the alep system in order to incorporate informationdelivered by the part-of-speech tagger. The resulting tagging is converted by means of \lift" rules into partial linguistic descriptions, which provide the direct input to the grammatical analysis. We show that this procedure substantially reduces the parse times of the system.
منابع مشابه
The Integration of a Part - of - Speech Taggerinto
We describe how part-of-speech information delivered by a tagger (the mpro tool) has been integrated into the alep (Advanced Language Engineering Platform) system. For this we extended an approach described within the ls-gram project, which consisted in deening the Text Handling component of alep in such a way that so-called \messy details" are handled within this subsystem, hence keeping the (...
متن کاملStudying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملEvaluating the Impact of External Lexical Resources into a CRF-based Multiword Segmenter and Part-of-Speech Tagger
Résumé This paper evaluates the impact of external lexical resources into a CRF-based joint Multiword Segmenter and Part-of-Speech Tagger. We especially show different ways of integrating lexicon-based features in the tagging model. We display an absolute gain of 0.5% in terms of f-measure. Moreover, we show that the integration of lexicon-based features significantly compensates the use of a s...
متن کاملبررسی مقایسهای تأثیر برچسبزنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی
In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...
متن کاملA Light Sliding-Window Part-of-Speech Tagger for the Apertium Free/Open-Source Machine Translation Platform
This paper describes a free/open-source implementation of the light sliding-window (LSW) part-of-speech tagger for the Apertium free/open-source machine translation platform. Firstly, the mechanism and training process of the tagger are reviewed, and a new method for incorporating linguistic rules is proposed. Secondly, experiments are conducted to compare the performances of the tagger under d...
متن کامل